Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve object_store docs #4978

Merged
merged 3 commits into from
Oct 30, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Oct 23, 2023

Which issue does this PR close?

Closes #3888
Relates to #4922 (review)
Part of #4879

Rationale for this change

This works on improving the docs for object_store, with a particular view to:

  • Better document the functionality
  • Articulate the vision of the crate and how it is differentiated from other offerings

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Oct 23, 2023
@tustvold tustvold force-pushed the object-store-docs branch 2 times, most recently from 0b14d77 to e274581 Compare October 27, 2023 11:41
@tustvold tustvold added the development-process Related to development process of arrow-rs label Oct 27, 2023
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great document! Inspired me a lot.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it -- thank you @tustvold -- very nice. 👏

I have a bunch of small editorial improvement suggestions but nothing that I think would prevent this PR from merging as is

//!
//! 3. Stable and predictable governance via the [Apache Arrow] project.
//! 3. Support for advanced functionality, including [ACID] reads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you don't use ACID here as it is a database term and not as commonly applied to object stores in my experience.

For example, I think the claim that the D in ACID for Durable counts as "advanced" functionality for object stores / their APIs is confusing. It is also unclear to me what this crates adds for Isolation in the classic database sense of the word.

Perhaps a better term would be "atomic reads/writes" which also highlights the feature I think you are hinting at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to allude that you could build ACID transactions on top, but fair enough will change

//!
//! 4. Stable and predictable governance via the [Apache Arrow] project
//!
//! 5. Very low dependency footprint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! 5. Very low dependency footprint
//! 5. Small dependency footprint (depends on only a small number of common crates)

//! # Why not a Filesystem Interface?
//!
//! Whilst this crate does provide a [`BufReader`], the [`ObjectStore`] interface mirrors the APIs
//! of object stores and not filesystems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might help to give an example of what a filesystem API means. Perhaps something like

Suggested change
//! of object stores and not filesystems.
//! of object stores and not filesystems. For example, it purposely does not offer a `Seek` like API.

object_store/src/lib.rs Show resolved Hide resolved
Comment on lines 293 to 294
//! A common pattern, especially when reading structured datasets, is to need to fetch
//! multiple ranges of a particular object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! A common pattern, especially when reading structured datasets, is to need to fetch
//! multiple ranges of a particular object.
//! A common pattern, especially when reading structured datasets, is to fetch
//! multiple, potentially discontiguous, ranges of a particular object.

//! Ok(match self.entries.get_mut(path) {
//! Some(e) => match e.refreshed_at.elapsed() < Duration::from_secs(10) {
//! true => e.data.clone(), // Return cached data
//! false => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! false => {
//! false => { // check if remote version has changed

//! e.data.clone()
//! }
//! },
//! None => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! None => {
//! None => { // not cached, fetch value

//!
//! # Conditional Put
//!
//! The default behaviour when writing data is to upsert any existing object at the given path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! The default behaviour when writing data is to upsert any existing object at the given path.
//! The default behaviour when writing data is to upsert any existing object at the given path, overwriting any previous values.

//! # Arc::new(InMemory::new())
//! # }
//! # fn do_update(b: Bytes) -> Bytes {b}
//! # async fn conditional_put() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to have this name exposed

Suggested change
//! # async fn conditional_put() {
//! async fn conditional_put() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to add a comment

//! // Attempt to commit transaction
//! match store.put_opts(&path, new, PutMode::Update(version).into()).await {
//! Ok(_) => break, // Successfully committed
//! Err(Error::Precondition { .. }) => continue, // Transaction conflict, try again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! Err(Error::Precondition { .. }) => continue, // Transaction conflict, try again
//! Err(Error::Precondition { .. }) => continue, // object has been changed, try again

@tustvold tustvold merged commit cc23cac into apache:master Oct 30, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of arrow-rs object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider looking into Apache OpenDAL as an alternative to object_store
3 participants